Skip to content

gh-148085: datetime cache time module lookups#148088

Open
maurycy wants to merge 7 commits intopython:mainfrom
maurycy:datetime-cache-time-module-lookups
Open

gh-148085: datetime cache time module lookups#148088
maurycy wants to merge 7 commits intopython:mainfrom
maurycy:datetime-cache-time-module-lookups

Conversation

@maurycy
Copy link
Copy Markdown
Contributor

@maurycy maurycy commented Apr 4, 2026

See gh-148085 for more details.

This takes a lazy approach, in the spirit of lazy, instead of preloading in init_state(). The trade-off is decreased readability.

Please see Links to previous discussion of this feature section in the gh-148085. My understanding is that only @ericsnowcurrently's work in gh-119810 made this possible and clean.

Benchmark

Benchmark main-c50d6cd datetime-cache-time-module-lookups-ae32bfd
datetime.timetuple 260 ns 124 ns: 2.10x faster
datetime.strftime 784 ns 517 ns: 1.52x faster
date.timetuple 258 ns 124 ns: 2.08x faster
Geometric mean (ref) 1.88x faster

For:

import pyperf, datetime

r = pyperf.Runner()
dt = datetime.datetime(2026, 4, 4, 12, 30, 45)
d = datetime.date(2026, 4, 4)

r.bench_func("datetime.timetuple", dt.timetuple)
r.bench_func("datetime.strftime", dt.strftime, "%Y-%m-%d")
r.bench_func("date.timetuple", d.timetuple)

For:

2026-04-04T14:17:02.794335000+0200 maurycy@gimel /Users/maurycy/work/cpython-main (main c50d6cd) % ./python.exe -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'
[127] 2026-04-04T14:17:24.011667000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (datetime-cache-time-module-lookups a49f862) % ./python.exe -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'

Copy link
Copy Markdown
Member

@StanFromIreland StanFromIreland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a precedent for lazy caching, I think it's always eager? I'm worried this might cause issues, historically datetime and caching hasn't worked out.

@maurycy
Copy link
Copy Markdown
Contributor Author

maurycy commented Apr 4, 2026

@StanFromIreland We do:

https://github.com/python/cpython/blob/dea4083aa95/Modules/arraymodule.c#L2561

https://github.com/python/cpython/blob/dea4083aa95/Modules/_decimal/_decimal.c#L3699

I copied my approach from there. I'm OK with moving to init_state(), somehow PEP 810 set the tone.

I believe the reason why it didn't work is that it relied on a global variable, instead of per-interpreter module. I left a note in #148085 in the previous discussions feature section.

return NULL;
}
if (st->time_time == NULL) {
st->time_time = PyImport_ImportModuleAttrString("time", "time");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, I think we could just replace this with a call to PyTime_Time directly, I assume this predates the existence of the C-API.

Copy link
Copy Markdown
Contributor Author

@maurycy maurycy Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StanFromIreland Thank you for taking a look. This is definitely a great simplification:

d3944fe

I updated the benchmark:

import datetime
import pyperf

r = pyperf.Runner()
dt = datetime.datetime(2026, 4, 4, 12, 30, 45)
d = datetime.date(2026, 4, 4)

class MyDate(datetime.date):
    pass

r.bench_func("datetime.timetuple", dt.timetuple)
r.bench_func("datetime.strftime", dt.strftime, "%Y-%m-%d")
r.bench_func("date.timetuple", d.timetuple)
r.bench_func("date.today", datetime.date.today)
r.bench_func("date_subclass.today", MyDate.today)

This is the result:

Benchmark main-fbdbea9 datetime-cache-time-module-lookups-d3944fe
datetime.timetuple 264 ns 130 ns: 2.04x faster
datetime.strftime 776 ns 524 ns: 1.48x faster
date.timetuple 263 ns 129 ns: 2.04x faster
date.today 78.4 ns 76.0 ns: 1.03x faster
date_subclass.today 306 ns 149 ns: 2.05x faster
Geometric mean (ref) 1.67x faster

date.today is expected: a fast hot path, but very good result in bypassing import in date_subclass.today

The downside is - as per @picnixz comment - that it now definitely breaks removing the module from sys.modules.

On the other hand, we don't seem to particularly care about monkey-patching:

st->codecs_encode = PyImport_ImportModuleAttrString("codecs", "encode");

state->array_reconstructor = PyImport_ImportModuleAttrString(

state->_tzpath_find_tzfile =

state->asyncio_mod = PyImport_ImportModule("asyncio");

state->PyDecimal = PyImport_ImportModuleAttrString("_pydecimal", "Decimal");

etc.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 4, 2026

I remember there were issues with datetime and the capsule API so be careful there. Likewise, if you do per-state caching, it may break existing hooks (removing the module from sys.modules will not make it reload I think)

@maurycy
Copy link
Copy Markdown
Contributor Author

maurycy commented Apr 6, 2026

Likewise, if you do per-state caching, it may break existing hooks (removing the module from sys.modules will not make it reload I think)

How real of a concern is this?

I've grepped the code and I think we already tend to break monkey-patching:

st->codecs_encode = PyImport_ImportModuleAttrString("codecs", "encode");

state->array_reconstructor = PyImport_ImportModuleAttrString(

state->_tzpath_find_tzfile =

state->asyncio_mod = PyImport_ImportModule("asyncio");

state->PyDecimal = PyImport_ImportModuleAttrString("_pydecimal", "Decimal");

I definitely do not want to cause any regression, but this seems to be a pattern, and the performance gain (on repeated runs) is real.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 6, 2026

I would say it could break in tests. It's not uncommon to patch time-related modules when working with datetime I'd say. I honestly don't know if this is a real issue but I just wanted you to be aware of this possibility. In the past, I added NEWS entries indicating that imports were changed for performance reason just for users to know that the reference is no more local/global. In this case, it'd be easy to fix if someone first mock time, and then import datetime instead of doing the other way around.

@htjworld
Copy link
Copy Markdown

htjworld commented Apr 7, 2026

Hi @maurycy , great work on this.

Since this module declares Py_MOD_GIL_NOT_USED, I was wondering about the lazy init pattern in a free-threaded build. If two threads enter build_struct_time concurrently with st->time_struct_time == NULL, both would call PyImport_ImportModuleAttrString and one overwrite would leave a reference with an incremented refcount that never gets decref'd.

I noticed arraymodule.c and _decimal.c have the same pattern, so maybe this is just accepted. But _asynciomodule.c initializes eagerly in module_init() which avoids this structurally. Was lazy chosen deliberately here over eager init in init_state()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants